AIML Capstone Project: Automatic Ticket Assignment(NLP)

Problem Statement


One of the key activities of any IT function is to “Keep the lights on” to ensure there is no impact to the Business operations. IT leverages Incident Management process to achieve the above Objective. An incident is something that is unplanned interruption to an IT service or reduction in the quality of an IT service that affects the Users and the Business. The main goal of Incident Management process is to provide a quick fix / workarounds or solutions that resolves the interruption and restores the service to its full capacity to ensure no business impact. In most of the organizations, incidents are created by various Business and IT Users, End Users/ Vendors if they have access to ticketing systems, and from the integrated monitoring systems and tools. Assigning the incidents to the appropriate person or unit in the support team has critical importance to provide improved user satisfaction while ensuring better allocation of support resources. The assignment of incidents to appropriate IT groups is still a manual process in many of the IT organizations. Manual assignment of incidents is time consuming and requires human efforts. There may be mistakes due to human errors and resource consumption is carried out ineffectively because of the misaddressing. On the other hand, manual assignment increases the response and resolution times which result in user satisfaction deterioration / poor customer service.

Business Domain Value



In the support process, incoming incidents are analyzed and assessed by organization’s support teams to fulfill the request. In many organizations, better allocation and effective usage of the valuable support resources will directly result in substantial cost savings. Currently the incidents are created by various stakeholders (Business Users, IT Users and Monitoring Tools) within IT Service Management Tool and are assigned to Service Desk teams (L1 / L2 teams). This team will review the incidents for right ticket categorization, priorities and then carry out initial diagnosis to see if they can resolve. Around ~54% of the incidents are resolved by L1 / L2 teams. Incase L1 / L2 is unable to resolve, they will then escalate / assign the tickets to Functional teams from Applications and Infrastructure (L3 teams). Some portions of incidents are directly assigned to L3 teams by either Monitoring tools or Callers / Requestors. L3 teams will carry out detailed diagnosis and resolve the incidents. Around ~56% of incidents are resolved by Functional / L3 teams. Incase if vendor support is needed, they will reach out for their support towards incident closure. L1 / L2 needs to spend time reviewing Standard Operating Procedures (SOPs) before assigning to Functional teams (Minimum ~25-30% of incidents needs to be reviewed for SOPs before ticket assignment). 15 min is being spent for SOP review for each incident. Minimum of ~1 FTE effort needed only for incident assignment to L3 teams.

During the process of incident assignments by L1 / L2 teams to functional groups, there were multiple instances of incidents getting assigned to wrong functional groups. Around ~25% of Incidents are wrongly assigned to functional teams. Additional effort needed for Functional teams to re-assign to right functional groups. During this process, some of the incidents are in queue and not addressed timely resulting in poor customer service

Guided by powerful AI techniques that can classify incidents to right functional groups can help organizations to reduce the resolving time of the issue and can focus on more productive tasks.

Project Description & Dataset

Details about the data and dataset files are given in below link, https://drive.google.com/open?id=1OZNJm81JXucV3HmZroMq6qCT2m7ez7IJ

Goal

In this capstone project, the goal is to build a classifier that can classify the tickets by analyzing text.

Project Objectives

The objective of the project is,

Load the dataset & Libraries

Here the dataset is an excel file and use pandas library to load the excel file(dataset) to a pandas dataframe.

Pre-Processing(Feature Engineering and Selection), Data Visualization and EDA

Why is data pre-processing required?

In any Machine Learning process, Data Preprocessing is that step in which the data gets transformed, or Encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.

What is Feature Engineering and selection?

Feature engineering: The process of creating new features from raw data to increase the predictive power of the learning algorithm. Engineered features should capture additional information that is not easily apparent in the original feature set.

Feature selection: The process of selecting the key subset of features to reduce the dimensionality of the training problem.

Why data visualization for EDA required?

It provides a high-level interface for drawing attractive and informative statistical graphics. Data visualization is an important part of analysis since it allows even non-programmers to be able to decipher trends and patterns

Here, we start by identifying the basic traits of the data that is rows and columns

Feature Description

  1. Short description: Ticket Issue title or short desciption about the issue. Sometimes, the issue is understood from short desciption itself.

  2. Description: Detailed explanation of issue and the scenario.

  3. Caller: The person who raised the ticket or raised it behalf of someone.

  4. Assignment group: The group/category to which the ticket is assigned.

Initial observaion of data set

  1. We observe that all columns don't have 8500 non null values, means there are null vales in data that we have to take care of We can observe that the data is higly imbalanced & skewed.

  2. Total records: 8500 & Total Attributes: 4

  3. Since our goal is automatic ticket assignment, It doesn't depend on the caller. Also a caller can raise tickets for any issue he/she is facing. Therefore, we can ignore the caller attribute/feature for further detail analysis.

  4. There are 8 records with null value in short description and 1 record with null value in description.

  5. One user/caller has raised 810 tickets.

  6. There are 74 unique groups and the group GRP_0 has been assigned 3976 tickets. GRP_0 has maximum instances around ~40%

  7. Top description is just the word 'the' which also we have to take care of.

  8. Short description & description count doesn't match with the total no of callers or assigned groups.

  9. Password Reset is one of the most occuring ticket topic.


Now we will be dropping the caller attribute from the data set since it doesn't make an impact in our task of automatic ticket assignment, and then find out the unique groups that are there in the data


Here we are creating a dataframe which shows us the percentage of data present in the given groups

This plot above, shows the percentage of data distribution across different groups within the data. The data below shows us the top 20 groups that have the most number of records.

Here we have the plot of the top 20 groups that have the data assigned to them.

Below, we are checking the data that has the least number of groups assigned.

Here we have the plot of the bottom 20 groups that have the data assigned to them.

In the dataframe above, we identify the number of groups that have tickets assigned to them in specific ranges. Ex : There are 6 groups that have just 1 ticket assigned to them.

Null Value Treatment.

  1. Drop the missing values: The number of complete cases i.e. observation with no missing data must be sufficient for the selected analysis technique then we can se this. However in our case, we already have quite skewed data and a long tail. So ignoring null rows can be fatal in totally missing out some groups
  2. Imputation: If the missing values in a column or feature are numerical, the values can be imputed by the mean or median of the complete cases of the variable. But in our case we have character strings
  3. COnjoining/ custom replacement: The cases where we can find anoher custom solution to give a logical value to null. In our case fortunately the we observed that wherever we have NaN in any description we have proper input in short column and vice versa. Hence we can make a readable combined description and make NaN replaceable and combine short and description as one column.

Thus to eliminate the null values, we'll be replacing them with space.

Thus to eliminate the null values, we'll be replacing them with space.

Joining the two columns short description and description.

Here, we are joining the two columns into a new columns because of the null values we replaced above, some of the short description/description columns have just spaces as their text. If this is the case, then we would have to join the columns, so that while making the vocabulary, we can refer to a single column to work with.

We do this concatenation here, to help with data pre-processing & data cleansing.

Later we will remove the repeated words in each combined description.

Creating a wordcloud.

A wordcloud is an image composed of words used in a particular text or subject, in which the size of each word indicates its frequency or importance. Here in the below wordcloud, we intend to do the same.

Wordcloud image of the description.

Lets view the word cloud of top 4 assignment groups to see the kind of tickets assigned to them Word Cloud for tickets with Assignment group 'GRP_0'

Word Cloud for tickets with Assignment group 'GRP_8'.

GRP_8 seems to have tickets related to outage, job failures, monitoring tool etc.

Word Cloud for tickets with Assignment group 'GRP_12'

GRP_12 contains tickets related to systems like disk space issues, t network issues like tie out, citrix issue, connectivity timeout etc.

Word Cloud for tickets with Assignment group 'GRP_24'.

GRP_24 - Tickets are mainly in german, these tickets need to be translated to english before passing it to our model.

Seems like there are few tickets with description in some other language, probably in German.

Data Encoding and translation

Text encoding transforms words into numbers and texts into number vectors. And in the given data set we also found that ther are a lot of entries/records that are in multiple languages. Thus we need them to translate as well to one language, that is easily understandable. Here we choose english. Also, there are some special characters in some records, so we will be translating them as well.

Language Detection

In the below code, we are now using langdetect library to detect the languages used in the data set.

Language Translation

We can see that most of the tickets are in english, followed by tickets in German language. We need to translate these into english. We will be using google translate package to translate

Google Translate API is used for translating the non-english text. However, there is limit imposed due to garbage values and non-ascii symbols preventing proper translation.

So the traslation was done in 2 batches:

Exploring the different language disribution in the Dataset

Group-wise Language Distribution

Observation:

From above table and bar chart, we can observe that the language is distributed across groups and are not specific to certain groups alone.

Data Cleansing

Various different steps that are followed while preprocessing the data:

  1. Lowercasing: converting the words into lower case format. (NLU -> nlu). Words having the same meaning like nlp and NLP if they are not converted into lowercase then these both will constitute as non-identical words in the vector space model.

  2. Stop words removal: These are the most often used that do not have any significance while determining the two different documents like (a, an, the, etc.) so they are to be removed.

  3. Contextual conversational words removal: In our case, words like 'recieved from','to','regards','subject','email address', which are identified as words used in an standard email converstaions to be removed

  4. Punctuation: The text has several punctuations. Punctuations are often unnecessary as it doesn’t add value or meaning to the NLP model.

  5. Other steps: Other cleaning steps can be performed based on the data. Listed a few of them below,Remove URLs, Remove HTML tags, Remove numbers, Remove hashtags etc are also used here

Language Translation after cleaning & before stop words removal & tokenization

Remove duplicates in combined descriptions

This step was necessary to remove the duplicate words which were formed due to the concatenation of short description & long description columns of the dataset.

These duplicate words would increase the word counts and possibly impact the model building steps and the performance of models.

Deriving n-grams

N-grams of texts are extensively used in text mining and natural language processing tasks.

They are basically a set of co-occuring words within a given window. N-grams is a contiguous sequence of N items from a given sample of text or speech, in the fields of computational linguistics and probability. The items can be phonemes, syllables, letters, words or base pairs according to the application. N-grams are used to describe the number of words used as observation points, e.g., unigram means singly-worded, bigram means 2-worded phrase, and trigram means 3-worded phrase. We'll be using scikit-learn’s CountVectorizer function to derive n-grams We will write a generic method to derive the n-grams.

Lemmatization

1.Lemmatization is the process of process of reducing a word to its root form by grouping together the different inflected forms of a word so they can be analysed as a single item.

2.It helps to reduce variations of the same word, thereby reducing the corpus of words to be included in the model.

So,it returns the base or dictionary form of a word, which is known as the lemma.It is important when clean the data of all words of a given root.Lemmatizing considers the context of the word and shortens the word into its root form based on the dictionary definition.

Observations

It is clear from the n-gram analysis and the word cloud that in the dataset, most issues are related to:

Sample Analysis on GRP_0: It is the most frequent group and most of the tickets assigned to this group, shows us that this group deals with mostly the maintenance problems such as password reset, account lock, login issue, ticket update etc.

Maximum of the tickets from GRP_0 for human intervention can be reduced by putting automation scripts/mechanisms to help resolve these common maintenance issues. This will help in lowering the inflow of service tickets which need human intervention and thereby saving the resource/hour efforts spend and reducing cost involved for man hours.

Here we remove those records from our dataframe for which the no of words in the cleaned description is less than 2 words

Hence we have considered those records where the 'num_wds' is greater than 1

The new dataframe has 8424 records now.

76 records had no: of words less than 2 and hence they were removed.

Visualize a distribution of the description word counts to see how skewed our average might be by outliers. Let's generate another plot to take a look

Number of unique words in each article

average (mean) number of unique words per incident, and the minimum and maximum unique word counts.

When we plot this into a chart, we can see that while the distribution of unique words is not skewed..

Mean Number of Words in tickets per Assignment Group

Mean Number of Unique Words in tickets per Assignment Group

Finally, let’s look at the most common words over the entire corpus.

Tokenization

Tokenization is a process that splits an input sequence into so-called tokens where the tokens can be a word, sentence, paragraph etc.

Let's create a copy of the clean df for modeling purpose.

Topic Modeling & LDA

What is topic modelling?

Topic modeling provides methods for automatically organizing, understanding, searching, and summarizing large electronic archives. It can help with the following:

For example, let’s say a document belongs to the topics food, dogs and health. So if a user queries “dog food”, they might find the above-mentioned document relevant because it covers those topics(among other topics). We are able to figure its relevance with respect to the query without even going through the entire document.

Therefore, by annotating the document, based on the topics predicted by the modeling method, we are able to optimize our search process.

Topic Modeling

Here for topic modeliing, we are using Genism.

Gensim = “Generate Similar” is a popular open source natural language processing (NLP) library used for unsupervised topic modeling. It uses top academic models and modern statistical machine learning to perform various complex tasks such as −

Apart from performing the above complex tasks, Gensim, implemented in Python and Cython, is designed to handle large text collections using data streaming as well as incremental online algorithms. This makes it different from those machine learning software packages that target only in-memory processing.

Note : Bigram is 2 consecutive words in a sentence. Trigram is 3 consecutive words in a sentence.

LDA - Latent Dirichlet Allocation

It is one of the most popular topic modeling methods. Each document is made up of various words, and each topic also has various words belonging to it. The aim of LDA is to find topics a document belongs to, based on the words in it.

In natural language processing, perplexity is a way of evaluating language models.

A language model is a probability distribution over entire sentences or texts.

A low perplexity indicates the probability distribution is good at predicting the sample.

What is pyLDavis?

pyLDAvis is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.

Visualize data distribution in the dataset

Even after data cleaning process we can observe a lot of imbalance in the data distribution.

This imbalance in the dataset demands for further processing of the dataset with proper measures like resampling to build better performing prediction models.

Build Models & Evaluate

Overview of this step:

Traditional ML Models - without resampling

Traditional ML Models considered:

Naive Bayes

Naive Bayes method is a probabilistic model on set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.Naive Bayes classifier can be extremely fast compared to more sophisticated methods.

K-Nearest Neighbor (KNN)- Without Sampling

The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these.Being a non-parametric method, it is often successful in classification situations where the decision boundary is very irregular.

Support Vector Machine (SVM)- Without Sampling

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

Linear SVM

The algorithm creates a line or a hyperplane which separates the data into classes.

The advantages of support vector machines are:

Decision Trees- Without Sampling

Decision Trees Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.DecisionTreeClassifier is a class capable of performing multi-class classification on a dataset.

Random Forest Classifier- Without Sampling

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.Essentially, Random Forest is a good model if we want high performance with less need for interpretation.

Logistic Regression- Without Sampling

Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).

ADA Boost Classifier- Without Classifier

An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

Bagging Classifier-Wihout Sampling

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

XG Boost Classifier-Without Sampling

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve the problem in a fast and accurate way.

Comparison of Traditional ML Models - without Sampling

Observation:

We first analysed the dataset provided to us, undestood the structure of the data provided - number of columns, field , datatypes etc.

We did Exploratory Data Analysis to derive further insights from this data set and we found that Data is very much imbalanced, there are around ~45% of the Groups with less than 20 tickets.

Few of the tickets are in foreign language like German The data has lot of noise in it, for eg- few tickets related to account setup are spread across multiple assignment groups. We performed the data cleaning, google translation and preprocessing.

Here in this comparison of different traditional ML models, we can observe a substantial difference in the accuracy of training and test sets. Major reason is possibly the imbalanced data distribution of the dataset used and the inability of the model to learn and adapt during training.

We need to check if these issues can be handled by resampled data and using deep learning techniques.

Model Improvement 1: Utlizing Deep Learning Techniques

Deep Learning Models - without Sampling

Deep learning is a subset of machine learning where artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data. Similarly to how we learn from experience, the deep learning algorithm would perform a task repeatedly, each time tweaking it a little to improve the outcome. We refer to ‘deep learning’ because the neural networks have various (deep) layers that enable learning. Deep Learmning Models considered:

Importing needed libraries to create the model:

  1. Importing sequence flow
  2. Importing Input layers, Droup out module Flattenn layer, Dense layer module, Embedding laye module, LSTM and GRU modules
  3. Importing BatchNormalization, TimeDistributed, Conv1D, MaxPooling1D, SpatialDropout1D modules
  4. Importing Text Tokenizer module and concatenating module

Embedding Matrix creation:

Load the GloVe embeddings in the model

The embedding layer has a single weight matrix: a 2D float matrix where each entry i is the word vector meant to be associated with index i. Simple enough. Load the GloVe matrix you prepared into the embedding layer, the first layer in the model.

Recurrent Neural Network(RNN) - LSTM Model - without Sampling

Defining the model can be done with below points:

  1. we've used Keras' Sequential() to instantiate a model. It takes a group of sequential layers and stacks them together into a single model. Into the Sequential() constructor, we pass a list that contains the layers we want to use in our model.

  2. We've made several Dense layers and a single Dropout layer in this model. We've made the input_shape equal to the Maximum Sequenec Length

  3. We defined Embedding Layer on the first layer as the input of that layer.

  4. There are 200 neurons in Convolution layers and It has relu as activation funtionand 128 neurosn for LSTM layer.

  5. There's 100 neurons in dense layer. This is typically up to testing - putting in more neurons per layer will help extract more features, but these can also sometimes work against the goal.It has relu as activation funtion

  6. Finally, we have a Dense layer with size as the output layer. It has the Softmax activation function.

  7. At last, we measure the loss with ategorical cross entropy fucntion, The efficient ADAM optimization algorithm is used to find the weights and the accuracy metric is calculated and reported each epoch.

We have to add some call backs to the model for early stopping and reduce on plateau if we see no improvement in loss

Performance of the RNN with LSTM

Recurrent Neural Network(RNN) - GRU Model - without Sampling

Model can be defined as:

  1. we've used Keras' Sequential() to instantiate a model. It takes a group of sequential layers and stacks them together into a single model. Into the Sequential() constructor, we pass a list that contains the layers we want to use in our model.

  2. We've made several Dense layers and a single Dropout layer in this model. We've made the input_shape equal to the Maximum Sequence Length

  3. We defined Embedding Layer on the first layer as the input of that layer.

  4. We added a GRU function layer with 128 neurons.

  5. There's 100 neurons in dense layer. This is typically up to testing - putting in more neurons per layer will help extract more features, but these can also sometimes work against the goal.It has relu as activation funtion

  6. Finally, we have a Dense layer with size as the output layer. It has the Softmax activation function.

  7. At last, we measure the loss with ategorical cross entropy fucntion, The efficient ADAM optimization algorithm is used to find the weights and the accuracy metric is calculated and reported each epoch.

We have to add some call backs to the model for early stopping and reduce on plateau if we see no improvement in loss

Performance of the RNN with LSTM and RNN wih GRU model

LSTM model (Bidirectional) - without Sampling

Model can be defined as:

  1. We created two copies of the hidden layer, one fit in the input sequences as-is and one on a reversed copy of the input sequence. By default, the output values from these LSTMs will be concatenated.

  2. We defined Embedding Layer on the first layer as the input of that layer.

  3. We added a LSTM layer with 128 neurons.

  4. There's 100 neurons in dense layer. This is typically up to testing - putting in more neurons per layer will help extract more features, but these can also sometimes work against the goal.It has relu as activation funtion

  5. Finally, we have a Dense layer with size as the output layer. It has the Softmax activation function.

  6. At last, we measure the loss with ategorical cross entropy fucntion,The efficient ADAM optimization algorithm is used to find the weights and the accuracy metric is calculated and reported each epoch.

We have to add some call backs to the model for early stopping and reduce on plateau if we see no improvement in loss

Performance comparion of the RNN with LSTM, RNN wih GRU model and Bidirectional LSTM model

Observation:

The difference between training accuracy and testing accuracy is not high. There is scope for much improvement in deep learning models and the testing accuracies of these models look promising.

The model seems to be overfitting even with high need of tuning the model, but we still observe the overfitting already

The low accuracy is suspected to be due to imbalanced dataset used for training and testing.

We have to work upon the resampling the data to make the model work better

If we see in the above results we see that RNN-LSTM MOdel has more room to be tuned without getting overfitted

We need to explore ways to improve & fine tune the model performance without overfitting.

Future steps planned for deriving better results

  1. Data Imbalance Rationalisation: Data set will be resampled based on multiple approaches:

    a. Creating seperate single target group for not well represeted groups(may be with 20 or less assigned tickets) and then classify against highly represented group.

    b. The reclassify the group (cluster of meagre groups) in to the original groups.

    c. May be total drop off of some groups that are very sparsely represented (may be with less than 5 observations).

  1. Caution to be taken for avoiding data leak between training and testing runs.
  2. Hyper-parameter tuning for both Traditional ML models.
  3. Playing with the learning rate to derive optimal speed and avoid minima jumps.
  4. Increasing epochs for model to avoid underfitting
  5. Will also try to find any new feature that is more defining to the assignment groups.
  6. Increasing number of layers in Deep Learning Model
  7. Increasing number of neurons in hidden layers

Model Improvement 2: Utilization of Resampled Training Dataset

Load Required Libraries and Define Common Functions

Grouping of Dataset based on ticket count

The groups with tickets less than 100 are grouped into a new Group - GRP_Rare.

A new dataframe(df_sample2) is created to hold these group data separately.

The datarame "df_sample1" holds all the groups and a new column to be added where the Group Name is given as "GRP_Rare", if the corresponding record belongs to a group with ticket count <= 100.

Sample 1: All Groups

Sample 2: Groups with tickets <= 100

Resampling of Training Datasets

Traditional ML Models - with Resampling

Traditional ML Models considered:

Multinomial Naive Bayes - with Resampling
K-Nearest Neighbor (KNN) - with Resampling
Support Vector Machine (SVM) - with Resampling

Linear SVM

Decision Trees - with Resampling
Random Forest Classifier - with Resampling
Logistic Regression - with Resampling
ADA Boost Classifier - with Resampling
Bagging Classifier - with Resampling
XG Boost Classifier - with Resampling

Comparison of Traditional ML Models - with Resampling

Deep Learning Models - with Resampling

Deep Learning Models considered:

Recurrent Neural Network(RNN) - LSTM Model - with Resampling
Recurrent Neural Network(RNN) - GRU Model - with Resampling
LSTM model (Bidirectional) - with Sampling

Comparison of Deep Learning Models - with Resampling

Conclusion on Model Improvement 2

Model Improvement 3: Utilization of Grouping and fastext

Grouping of Dataset based on ticket count

The groups with tickets less than 40 are grouped into a new Group - GRP_LE40 and all the other ticket categories are considered individually for final evaluation & testing. GRP_0 is also considered individually since it has highest no of the ticket counts and quite higher than the rest.

But during the model building process, we grouped the rest of the ticket categories in to groups based on ticket counts.

New dataframes are created to hold these group data separately.

The dataframe "df_sample1" holds all the groups.

Sample 1: GRP_0 vs GRP_Others1

Sample 2(df_sample2): GRP_Others1 - GRP_LE100, GRP_LE289GT200, GRP_LE200GT100 & GRP_8

Sample 3(df_sample3): In GRP_LE289GT200 - Groups with tickets counts <= 289 & > 200

Sample 4(df_sample4): Groups with ticket counts <= 200 & > 100(GRP_LE200GT100)

Sample 5(df_sample5): Groups with ticket counts <= 100(GRP_LE100)

Sample 6(df_sample6): Groups with ticket counts <= 100 & > 40(GRP_LE100GT40)

Utilization of Classifications Techniques on df_sample1(GRP_0 vs GRP_Others1)

Classification using FastText on unsampled dataset1(df_sample1: GRP_0 vs GRP_Others1)
Evaluation of Model set1(df-sample1: GRP_0 vs GRP_Others1)

Utilization of Classifications Techniques on Sample 2(df_sample2: GRP_Others1 - GRP_LE100, GRP_LE289GT200, GRP_LE200GT100 & GRP_8)

Classification using FastText on Sample 2(df_sample2: GRP_Others1 - GRP_LE100, GRP_LE289GT200, GRP_LE200GT100 & GRP_8)
Evaluation of Model set2(df_sample2: GRP_Others1 - GRP_LE100, GRP_LE289GT200, GRP_LE200GT100 & GRP_8)

Utilization of Classifications Techniques on Sample 3(df_sample3: In GRP_LE289GT200 - Groups with tickets counts <= 289 & > 200)

Classification using FastText on Sample 3(df_sample3: In GRP_LE289GT200 - Groups with tickets counts <= 289 & > 200)
Evaluation of Model set3(df_sample3: In GRP_LE289GT200 - Groups with tickets counts <= 289 & > 200)

Utilization of Classifications Techniques on Sample 4(df_sample4): Groups with ticket counts <= 200 & > 100(GRP_LE200GT100)

Classification using FastText on Sample 4(df_sample4): Groups with ticket counts <= 200 & > 100(GRP_LE200GT100)
Evaluation of Model 4(df_sample4: Groups with ticket counts <= 200 & > 100(GRP_LE200GT100))

Utilization of Classifications Techniques on Sample 5(df_sample5): Groups with ticket counts <= 100(GRP_LE100)

Classification using FastText on Sample 5(df_sample5: Groups with ticket counts <= 100)
Evaluation of Model 5(df_sample5: Groups with ticket counts <= 100)

Utilization of Classifications Techniques on Sample 6(df_sample6): Groups with ticket counts <= 100 & > 40(GRP_LE100GT40)

Classification using FastText on Sample 6(df_sample6: Groups with ticket counts <= 100 & > 40)
Evaluation of Model 6(df_sample6: Groups with ticket counts <= 100 & > 40)

All Groups Models

Conclusion

We have better results based on the grouping and fasttext modelling technique which we employed for text classification.

One could conclude from this study that text classification doesn't have definite method for building models. We have to employ several techniques and find those set of techniques which could serve our model building purpose.